100 research outputs found
A Preferential Attachment Model for the Stellar Initial Mass Function
Accurate specification of a likelihood function is becoming increasingly
difficult in many inference problems in astronomy. As sample sizes resulting
from astronomical surveys continue to grow, deficiencies in the likelihood
function lead to larger biases in key parameter estimates. These deficiencies
result from the oversimplification of the physical processes that generated the
data, and from the failure to account for observational limitations.
Unfortunately, realistic models often do not yield an analytical form for the
likelihood. The estimation of a stellar initial mass function (IMF) is an
important example. The stellar IMF is the mass distribution of stars initially
formed in a given cluster of stars, a population which is not directly
observable due to stellar evolution and other disruptions and observational
limitations of the cluster. There are several difficulties with specifying a
likelihood in this setting since the physical processes and observational
challenges result in measurable masses that cannot legitimately be considered
independent draws from an IMF. This work improves inference of the IMF by using
an approximate Bayesian computation approach that both accounts for
observational and astrophysical effects and incorporates a physically-motivated
model for star cluster formation. The methodology is illustrated via a
simulation study, demonstrating that the proposed approach can recover the true
posterior in realistic situations, and applied to observations from
astrophysical simulation data
High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation
The ratio between two probability density functions is an important component
of various tasks, including selection bias correction, novelty detection and
classification. Recently, several estimators of this ratio have been proposed.
Most of these methods fail if the sample space is high-dimensional, and hence
require a dimension reduction step, the result of which can be a significant
loss of information. Here we propose a simple-to-implement, fully nonparametric
density ratio estimator that expands the ratio in terms of the eigenfunctions
of a kernel-based operator; these functions reflect the underlying geometry of
the data (e.g., submanifold structure), often leading to better estimates
without an explicit dimension reduction step. We show how our general framework
can be extended to address another important problem, the estimation of a
likelihood function in situations where that function cannot be
well-approximated by an analytical form. One is often faced with this situation
when performing statistical inference with data from the sciences, due the
complexity of the data and of the processes that generated those data. We
emphasize applications where using existing likelihood-free methods of
inference would be challenging due to the high dimensionality of the sample
space, but where our spectral series method yields a reasonable estimate of the
likelihood function. We provide theoretical guarantees and illustrate the
effectiveness of our proposed method with numerical experiments.Comment: With supplementary materia
Prototype selection for parameter estimation in complex models
Parameter estimation in astrophysics often requires the use of complex
physical models. In this paper we study the problem of estimating the
parameters that describe star formation history (SFH) in galaxies. Here,
high-dimensional spectral data from galaxies are appropriately modeled as
linear combinations of physical components, called simple stellar populations
(SSPs), plus some nonlinear distortions. Theoretical data for each SSP is
produced for a fixed parameter vector via computer modeling. Though the
parameters that define each SSP are continuous, optimizing the signal model
over a large set of SSPs on a fine parameter grid is computationally infeasible
and inefficient. The goal of this study is to estimate the set of parameters
that describes the SFH of each galaxy. These target parameters, such as the
average ages and chemical compositions of the galaxy's stellar populations, are
derived from the SSP parameters and the component weights in the signal model.
Here, we introduce a principled approach of choosing a small basis of SSP
prototypes for SFH parameter estimation. The basic idea is to quantize the
vector space and effective support of the model components. In addition to
greater computational efficiency, we achieve better estimates of the SFH target
parameters. In simulations, our proposed quantization method obtains a
substantial improvement in estimating the target parameters over the common
method of employing a parameter grid. Sparse coding techniques are not
appropriate for this problem without proper constraints, while constrained
sparse coding methods perform poorly for parameter estimation because their
objective is signal reconstruction, not estimation of the target parameters.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS500 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Reinterpreting Fundamental Plane Correlations with Machine Learning
This work explores the relationships between galaxy sizes and related
observable galaxy properties in a large volume cosmological hydrodynamical
simulation. The objectives of this work are to both develop a better
understanding of the correlations between galaxy properties and the influence
of environment on galaxy physics in order to build an improved model for the
galaxy sizes, building off of the {\it fundamental plane}. With an accurate
intrinsic galaxy size predictor, the residuals in the observed galaxy sizes can
potentially be used for multiple cosmological applications, including making
measurements of galaxy velocities in spectroscopic samples, estimating the rate
of cosmic expansion, and constraining the uncertainties in the photometric
redshifts of galaxies. Using projection pursuit regression, the model
accurately predicts intrinsic galaxy sizes and have residuals which have
limited correlation with galaxy properties. The model decreases the spatial
correlation of galaxy size residuals by a factor of 5 at small scales
compared to the baseline correlation when the mean size is used as a predictor.Comment: 16 pages, 12 figures, MNRA
A Statistical Method for Estimating Luminosity Functions using Truncated Data
The observational limitations of astronomical surveys lead to significant
statistical inference challenges. One such challenge is the estimation of
luminosity functions given redshift and absolute magnitude measurements
from an irregularly truncated sample of objects. This is a bivariate density
estimation problem; we develop here a statistically rigorous method which (1)
does not assume a strict parametric form for the bivariate density; (2) does
not assume independence between redshift and absolute magnitude (and hence
allows evolution of the luminosity function with redshift); (3) does not
require dividing the data into arbitrary bins; and (4) naturally incorporates a
varying selection function. We accomplish this by decomposing the bivariate
density into nonparametric and parametric portions. There is a simple way of
estimating the integrated mean squared error of the estimator; smoothing
parameters are selected to minimize this quantity. Results are presented from
the analysis of a sample of quasars.Comment: 30 pages, 9 figures, Accepted for publication in Ap
How to Optimally Constrain Galaxy Assembly Bias: Supplement Projected Correlation Functions with Count-in-cells Statistics
Most models for the connection between galaxies and their haloes ignore the
possibility that galaxy properties may be correlated with halo properties other
than mass, a phenomenon known as galaxy assembly bias. Yet, it is known that
such correlations can lead to systematic errors in the interpretation of survey
data. At present, the degree to which galaxy assembly bias may be present in
the real Universe, and the best strategies for constraining it remain
uncertain. We study the ability of several observables to constrain galaxy
assembly bias from redshift survey data using the decorated halo occupation
distribution (dHOD), an empirical model of the galaxy--halo connection that
incorporates assembly bias. We cover an expansive set of observables, including
the projected two-point correlation function ,
the galaxy--galaxy lensing signal , the void
probability function , the distributions of
counts-in-cylinders , and counts-in-annuli
, and the distribution of the ratio of counts in cylinders
of different sizes . We find that despite the frequent use of the
combination in
interpreting galaxy data, the count statistics, and
, are generally more efficient in constraining galaxy
assembly bias when combined with . Constraints
based upon and
share common degeneracy directions in the parameter space, while combinations
of with the count statistics are more
complementary. Therefore, we strongly suggest that count statistics should be
used to complement the canonical observables in future studies of the
galaxy--halo connection.Comment: Figures 3 and 4 show the main results. Published in Monthly Notices
of the Royal Astronomical Societ
Semi-supervised Learning for Photometric Supernova Classification
We present a semi-supervised method for photometric supernova typing. Our
approach is to first use the nonlinear dimension reduction technique diffusion
map to detect structure in a database of supernova light curves and
subsequently employ random forest classification on a spectroscopically
confirmed training set to learn a model that can predict the type of each newly
observed supernova. We demonstrate that this is an effective method for
supernova typing. As supernova numbers increase, our semi-supervised method
efficiently utilizes this information to improve classification, a property not
enjoyed by template based methods. Applied to supernova data simulated by
Kessler et al. (2010b) to mimic those of the Dark Energy Survey, our methods
achieve (cross-validated) 95% Type Ia purity and 87% Type Ia efficiency on the
spectroscopic sample, but only 50% Type Ia purity and 50% efficiency on the
photometric sample due to their spectroscopic follow-up strategy. To improve
the performance on the photometric sample, we search for better spectroscopic
follow-up procedures by studying the sensitivity of our machine learned
supernova classification on the specific strategy used to obtain training sets.
With a fixed amount of spectroscopic follow-up time, we find that deeper
magnitude-limited spectroscopic surveys are better for producing training sets.
For supernova Ia (II-P) typing, we obtain a 44% (1%) increase in purity to 72%
(87%) and 30% (162%) increase in efficiency to 65% (84%) of the sample using a
25th (24.5th) magnitude-limited survey instead of the shallower spectroscopic
sample used in the original simulations. When redshift information is
available, we incorporate it into our analysis using a novel method of altering
the diffusion map representation of the supernovae. Incorporating host
redshifts leads to a 5% improvement in Type Ia purity and 13% improvement in
Type Ia efficiency.Comment: 16 pages, 11 figures, accepted for publication in MNRA
- …